oxenstored: add a safe net mechanism for existing ill-behaved clients
In previous commit, we moved from exhaustively scanning all domain connections
to only processing those have correctly notified us by events. The benefits are
not only efficiency but also correctness, because it could potentially block an
ill-behaved client and have it waiting on its own mistake. If someone makes a
mistake on this when developing a piece of code, he/she would immediately
notice the problem (as the process being blocked), so that he/she could fix it
rightaway before anything else. Note that the chances of making such mistakes
are rare in reality, because most client code would use the libxenstore library
(which has all the notification logic built in correctly) instead of having to
implement raw accessing from scratch.
On the other hand, we did notice that there were some legacy code that didn't do
the notification correctly. As some code might be still running in wild, it
would be bad if they break by this change (e.g. after an upgrade). This patch
introduces a safe net mechanism to ensure ill-behaved clients continue to work,
but still retain most of the performance benefits here.
* We add a checker to still scan all the rings periodically, so that we can
still pick up these messages at an acceptable frequency.
* Internally, we introduce an io_credit concept for domain connections. It
represents the rounds of ring scan we are going to perform on a domain
connection. For well-behaved connections, this value is changing between 0
and 1; but for connections detected as ill-behaved, we'll bump its credit
to a high value so that we'll unconditionally scan its ring for the next
$n$ rounds. This way, the client won't hiccupped by the interval between
checker's running (especially during periods when it continously interacts
with oxenstored); and oxenstored doesn't have to keep scanning these
rings indefinitely (with the credit running out), as they are usually quite
most of the time.
* We log an message when a domain connection is suspected as ill-behaved.
Enable [info] level logging if you want/need to see it in action. Note that
this information won't be accurate, as false positives are possible due to
time window (e.g. we detect a client has written to the ring and we get no
notificiation from it for the time being, but still the notification could
potentially arrive at some time later). It's no harm to give a domain
connection extra credit though.
Signed-off-by: Zheng Li <dev@zheng.li>
Reviewed-by: David Scott <dave.scott@citrix.com>